Type - based and Token - based Learning of Kanji Morphemes

نویسنده

  • Kyo KAGEURA
چکیده

We have been developing methods of kanji morpheme analysis for the empirical modelling of terminology. In this paper we discuss the performance of kanji morpheme extraction and kanji sequence decomposition, both based on the same bigram statistics, focusing on the e ect of type-based and token-based trainings. The experiment shows that type-based training gives consistently better performance, which has both practical and theoretical importance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Kana-Kanji Translation System for Non-Segmented Input Sentences Based on Syntactic and Semantic Analysis

This paper presents a disambiguation approach for t ransla t ing non-segmented-Kana into Kanji. The method consists of two steps. In the first step, an input sentence is analyzed morphologically and ambiguous morphemes are stored in a network form. In the second step, the best path, which is a string of morphemes, is selected by syntactic and semantic analysis based on case grammar. In order to...

متن کامل

An Improved Token-Based and Starvation Free Distributed Mutual Exclusion Algorithm

Distributed mutual exclusion is a fundamental problem of distributed systems that coordinates the access to critical shared resources. It concerns with how the various distributed processes access to the shared resources in a mutually exclusive manner. This paper presents fully distributed improved token based mutual exclusion algorithm for distributed system. In this algorithm, a process which...

متن کامل

Composition and Decomposition of Japanese Katakana and Kanji Morphemes for Decision Rule Induction from Patent Documents

We propose a new method to construct a word list for rule induction from Japanese patent documents. For word segmentation in Japanese, statistical morphological analyzers have been used in many applications. However, the output of these morphological analyzers presents defects when analyzing unknown words, specifically words that contain Kanji/Katakana morphemes. Some words are overly segmented...

متن کامل

Audio-Based Learning of Kanji 1 Running head: An Audio-Based Approach to Mobile Learning of Japanese Kanji Characters An Audio-Based Approach to Mobile Learning of Japanese Kanji Characters

We describe the design and implementation of an audio-based computer system for mobile, nonvisual learning of the meaning and writing of "kanji" characters: the thousands of multi-stroke Chinese characters used in the Japanese logographic writing system. Our system is designed for use by non-native learners of Japanese as a foreign language. The key feature of our system is its innovative use o...

متن کامل

A Morpho-Syntactic Analyzer of Controlled Japanese

The proposed morpho-syntactic analyzer parses controlled Japanese texts such as articles in newspapers, technical magazines and professional journals and public documents that are transcribed wherever applicable by using Joyo Kanji (frequently used Chinese characters). The analyzer parses sentences in controlled Japanese texts into morpho-syntactic units, further dividing them into the content ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007